home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Aminet 4
/
Aminet 4 - November 1994.iso
/
aminet
/
comm
/
uucp
/
wcnews_1_0_30.lha
/
man
/
dbz3z.man
< prev
next >
Wrap
Text File
|
1994-01-04
|
16KB
|
397 lines
DBZ(3Z) DBZ(3Z)
1mNAME22m
dbminit, fetch, store, dbmclose - somewhat dbmcompatible
database routines
dbzfresh, dbzagain, dbzfetch, dbzstore - database routines
dbzsync, dbzsize, dbzincore, dbzcancel, dbzdebug -
database routines
1mSYNOPSIS22m
1m#include22m 1m<dbz.h>22m
1mdbminit(base)22m
1mchar22m 1m*base;22m
1mdatum22m
1mfetch(key)22m
1mdatum22m 1mkey;22m
1mstore(key,22m 1mvalue)22m
1mdatum22m 1mkey;22m
1mdatum22m 1mvalue;22m
1mdbmclose()22m
1mdbzfresh(base,22m 1msize,22m 1mfieldsep,22m 1mcmap,22m 1mtagmask)22m
1mchar22m 1m*base;22m
1mlong22m 1msize;22m
1mint22m 1mfieldsep;22m
1mint22m 1mcmap;22m
1mlong22m 1mtagmask;22m
1mdbzagain(base,22m 1moldbase)22m
1mchar22m 1m*base;22m
1mchar22m 1m*oldbase;22m
1mdatum22m
1mdbzfetch(key)22m
1mdatum22m 1mkey;22m
1mdbzstore(key,22m 1mvalue)22m
1mdatum22m 1mkey;22m
1mdatum22m 1mvalue;22m
1mdbzsync()22m
1mlong22m
1mdbzsize(nentries)22m
1mlong22m 1mnentries;22m
1mdbzincore(newvalue)22m
1mdbzcancel()22m
1mdbzdebug(newvalue)22m
3 Feb 1991 1
DBZ(3Z) DBZ(3Z)
1mDESCRIPTION22m
These functions provide an indexing system for rapid ran
dom access to a text file (the 4mbase24m 4mfile24m). Subject to
certain constraints, they are callcompatible with 4mdbm24m(3),
although they also provide some extensions. (Note that
they are 4mnot24m filecompatible with 4mdbm24m or any variant
thereof.)
In principle, 4mdbz24m stores keyvalue pairs, where both key
and value are arbitrary sequences of bytes, specified to
the functions by values of type 4mdatum24m, typedefed in the
header file to be a structure with members 4mdptr24m (a value
of type 4mchar24m 4m*24m pointing to the bytes) and 4mdsize24m (a value
of type 4mint24m indicating how long the byte sequence is).
In practice, 4mdbz24m is more restricted than 4mdbm24m. A 4mdbz24m
database must be an index into a base file, with the
database 4mvalue24ms being 4mfseek24m(3) offsets into the base file.
Each such 4mvalue24m must ``point to'' a place in the base file
where the corresponding 4mkey24m sequence is found. A key can
be no longer than DBZMAXKEY (a constant defined in the
header file) bytes. No key can be an initial subsequence
of another, which in most applications requires that keys
be either bracketed or terminated in some way (see the
discussion of the 4mfieldsep24m parameter of 4mdbzfresh24m, below,
for a fine point on terminators).
4mDbminit24m opens a database, an index into the base file
4mbase24m, consisting of files 4mbase1m24m.dir22m and 4mbase1m24m.pag22m which must
already exist. (If the database is new, they should be
zerolength files.) Subsequent accesses go to that
database until 4mdbmclose24m is called to close the database.
The base file need not exist at the time of the 4mdbminit24m,
but it must exist before accesses are attempted.
4mFetch24m searches the database for the specified 4mkey24m, return
ing the corresponding 4mvalue24m if any. 4mStore24m stores the 4mkey24m
4mvalue24m pair in the database. 4mStore24m will fail unless the
database files are writeable. See below for a complica
tion arising from case mapping.
4mDbzfresh24m is a variant of 4mdbminit24m for creating a new
database with more control over details. Unlike for
4mdbminit24m, the database files need not exist: they will be
created if necessary, and truncated in any case.
4mDbzfresh24m's 4msize24m parameter specifies the size of the first
hash table within the database, in keyvalue pairs. Per
formance will be best if 4msize24m is a prime number and the
number of keyvalue pairs stored in the database does not
exceed about 2/3 of 4msize24m. (The 4mdbzsize24m function, given
the expected number of keyvalue pairs, will suggest a
database size that meets these criteria.) Assuming that
an 4mfseek24m offset is 4 bytes, the 1m.pag22m file will be 4*4msize24m
3 Feb 1991 2
DBZ(3Z) DBZ(3Z)
bytes (the 1m.dir22m file is tiny and roughly constant in size)
until the number of keyvalue pairs exceeds about 80% of
4msize24m. (Nothing awful will happen if the database grows
beyond 100% of 4msize24m, but accesses will slow down somewhat
and the 1m.pag22m file will grow somewhat.)
4mDbzfresh24m's 4mfieldsep24m parameter specifies the field separa
tor in the base file. If this is not NUL (0), and the
last character of a 4mkey24m argument is NUL, that NUL compares
equal to either a NUL or a 4mfieldsep24m in the base file.
This permits use of NUL to terminate key strings without
requiring that NULs appear in the base file. The 4mfieldsep24m
of a database created with 4mdbminit24m is the horizontaltab
character.
For use in news systems, various forms of case mapping
(e.g. uppercase to lowercase) in keys are available. The
4mcmap24m parameter to 4mdbzfresh24m is a single character specify
ing which of several mapping algorithms to use. Available
algorithms are:
1m022m casesensitive: no case mapping
1mB22m same as 1m022m
1mNUL22m same as 1m022m
1m=22m caseinsensitive: uppercase and lowercase
equivalent
1mb22m same as 1m=22m
1mC22m RFC822 messageID rules, casesensitive
before `@' (with certain exceptions) and
caseinsensitive after
1m?22m whatever the local default is, normally 1mC22m
Mapping algorithm 1m022m (no mapping) is faster than the others
and is overwhelmingly the correct choice for most applica
tions. Unless compatibility constraints interfere, it is
more efficient to premap the keys, storing mapped keys in
the base file, than to have 4mdbz24m do the mapping on every
search.
For historical reasons, 4mfetch24m and 4mstore24m expect their 4mkey24m
arguments to be premapped, but expect unmapped keys in
the base file. 4mDbzfetch24m and 4mdbzstore24m do the same jobs but
handle all case mapping internally, so the customer need
not worry about it.
4mDbz24m stores only the database 4mvalue24ms in its files, relying
on reference to the base file to confirm a hit on a key.
References to the base file can be minimized, greatly
3 Feb 1991 3
DBZ(3Z) DBZ(3Z)
speeding up searches, if a little bit of information about
the keys can be stored in the 4mdbz24m files. This is ``free''
if there are some unused bits in an 4mfseek24m offset, so that
the offset can be 4mtagged24m with some information about the
key. The 4mtagmask24m parameter of 4mdbzfresh24m allows specifying
the location of unused bits. 4mTagmask24m should be a mask
with one group of contiguous 1m122m bits. The bits in the mask
should be unused (0) in 4mmost24m offsets. The bit immediately
above the mask (the 4mflag24m bit) should be unused (0) in 4mall24m
offsets; 4m(dbz)store24m will reject attempts to store a key
value pair in which the 4mvalue24m has the flag bit on. Apart
from this restriction, tagging is invisible to the user.
As a special case, a 4mtagmask24m of 1 means ``no tagging'',
for use with enormous base files or on systems with
unusual offset representations.
A 4msize24m of 0 given to 4mdbzfresh24m is synonymous with the local
default; the normal default is suitable for tables of
90100,000 keyvalue pairs. A 4mcmap24m of 0 (NUL) is synony
mous with the character 1m022m, signifying no case mapping
(note that the character 1m?22m specifies the local default
mapping, normally 1mC22m). A 4mtagmask24m of 0 is synonymous with
the local default tag mask, normally 0x7f000000 (specify
ing the top bit in a 32bit offset as the flag bit, and
the next 7 bits as the mask, which is suitable for base
files up to circa 24MB). Calling 4mdbminit(name)24m with the
database files empty is equivalent to calling
4mdbzfresh(name,0,'\t','?',0)24m.
When databases are regenerated periodically, as in news,
it is simplest to pick the parameters for a new database
based on the old one. This also permits some memory of
past sizes of the old database, so that a new database
size can be chosen to cover expected fluctuations. 4mDbza24m
4mgain24m is a variant of 4mdbminit24m for creating a new database
as a new generation of an old database. The database
files for 4moldbase24m must exist. 4mDbzagain24m is equivalent to
calling 4mdbzfresh24m with the same field separator, case map
ping, and tag mask as the old database, and a 4msize24m equal
to the result of applying 4mdbzsize24m to the largest number of
entries in the 4moldbase24m database and its previous 10 gener
ations.
When many accesses are being done by the same program, 4mdbz24m
is massively faster if its first hash table is in memory.
If an internal flag is 1, an attempt is made to read the
table in when the database is opened, and 4mdbmclose24m writes
it out to disk again (if it was read successfully and has
been modified). 4mDbzincore24m sets the flag to 4mnewvalue24m
(which should be 0 or 1) and returns the previous value;
this does not affect the status of a database that has
already been opened. The default is 0. The attempt to
read the table in may fail due to memory shortage; in this
case 4mdbz24m quietly falls back on its default behavior.
3 Feb 1991 4
DBZ(3Z) DBZ(3Z)
4mStore24ms to an inmemory database are not (in general) writ
ten out to the file until 4mdbmclose24m or 4mdbzsync24m, so if
robustness in the presence of crashes or concurrent
accesses is crucial, inmemory databases should probably
be avoided.
4mDbzsync24m causes all buffers etc. to be flushed out to the
files. It is typically used as a precaution against
crashes or concurrent accesses when a 4mdbz24musing process
will be running for a long time. It is a somewhat expen
sive operation, especially for an inmemory database.
4mDbzcancel24m cancels any pending writes from buffers. This
is typically useful only for incore databases, since
writes are otherwise done immediately. Its main purpose
is to let a child process, in the wake of a 4mfork24m, do a
4mdbmclose24m without writing its parent's data to disk.
If 4mdbz24m has been compiled with debugging facilities avail
able (which makes it bigger and a bit slower), 4mdbzdebug24m
alters the value (and returns the previous value) of an
internal flag which (when 1; default is 0) causes verbose
and cryptic debugging output on standard output.
Concurrent reading of databases is fairly safe, but there
is no (inter)locking, so concurrent updating is not.
The database files include a record of the byte order of
the processor creating the database, and accesses by pro
cessors with different byte order will work, although they
will be slightly slower. Byte order is preserved by 4mdbza24m
4mgain24m. However, agreement on the size and internal struc
ture of an 4mfseek24m offset is necessary, as is consensus on
the character set.
An open database occupies three 4mstdio24m streams and their
corresponding file descriptors; a fourth is needed for an
inmemory database. Memory consumption is negligible
(except for 4mstdio24m buffers) except for inmemory databases.
1mSEE22m 1mALSO22m
dbz(1), dbm(3)
1mDIAGNOSTICS22m
Functions returning 4mint24m values return 0 for success, -1
for failure. Functions returning 4mdatum24m values return a
value with 4mdptr24m set to NULL for failure. 4mDbminit24m attempts
to have 4merrno24m set plausibly on return, but otherwise this
is not guaranteed. An 4merrno24m of 1mEDOM22m from 4mdbminit24m indi
cates that the database did not appear to be in 4mdbz24m for
mat.
1mHISTORY22m
The original 4mdbz24m was written by Jon Zeeff (zeeff@b
3 Feb 1991 5
DBZ(3Z) DBZ(3Z)
tech.annarbor.mi.us). Later contributions by David But
ler and Mark Moraes. Extensive reworking, including this
documentation, by Henry Spencer (henry@zoo.toronto.edu) as
part of the C News project. Hashing function by Peter
Honeyman.
1mBUGS22m
The 4mdptr24m members of returned 4mdatum24m values point to static
storage which is overwritten by later calls.
Unlike 4mdbm24m, 4mdbz24m will misbehave if an existing keyvalue
pair is `overwritten' by a new 4m(dbz)store24m with the same
key. The user is responsible for avoiding this by using
4m(dbz)fetch24m first to check for duplicates; an internal
optimization remembers the result of the first search so
there is minimal overhead in this.
Waiting until after 4mdbminit24m to bring the base file into
existence will fail if 4mchdir24m(2) has been used meanwhile.
The RFC822 case mapper implements only a first approxima
tion to the hideouslycomplex RFC822 case rules.
The prime finder in 4mdbzsize24m is not particularly quick.
Should implement the 4mdbm24m functions 4mdelete24m, 4mfirstkey24m, and
4mnextkey24m.
On C implementations which trap integer overflow, 4mdbz24m will
refuse to 4m(dbz)store24m an 4mfseek24m offset equal to the greatest
representable positive number, as this would cause over
flow in the biased representation used.
4mDbzagain24m perhaps ought to notice when many offsets in the
old database were too big for tagging, and shrink the tag
mask to match.
Marking 4mdbz24m's file descriptors closeon4mexec24m would be a
better approach to the problem 4mdbzcancel24m tries to address,
but that's harder to do portably.
3 Feb 1991 6